Dataset statistics
| Number of variables | 4 |
|---|---|
| Number of observations | 100000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 18.5 MiB |
| Average record size in memory | 193.8 B |
Variable types
| Numeric | 3 |
|---|---|
| Categorical | 1 |
Reproduction
| Analysis started | 2024-02-18 15:32:46.607570 |
|---|---|
| Analysis finished | 2024-02-18 15:44:13.580982 |
| Duration | 11 minutes and 26.97 seconds |
| Software version | ydata-profiling vv4.6.4 |
| Download configuration | config.json |
user_id
Real number (ℝ)
| Distinct | 943 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 462.48475 |
| Minimum | 1 |
|---|---|
| Maximum | 943 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 46 |
| Q1 | 254 |
| median | 447 |
| Q3 | 682 |
| 95-th percentile | 892 |
| Maximum | 943 |
| Range | 942 |
| Interquartile range (IQR) | 428 |
Descriptive statistics
| Standard deviation | 266.61442 |
|---|---|
| Coefficient of variation (CV) | 0.57648262 |
| Kurtosis | -1.0973667 |
| Mean | 462.48475 |
| Median Absolute Deviation (MAD) | 213 |
| Skewness | 0.082533291 |
| Sum | 46248475 |
| Variance | 71083.249 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 405 | 737 | 0.7% |
| 655 | 685 | 0.7% |
| 13 | 636 | 0.6% |
| 450 | 540 | 0.5% |
| 276 | 518 | 0.5% |
| 416 | 493 | 0.5% |
| 537 | 490 | 0.5% |
| 303 | 484 | 0.5% |
| 234 | 480 | 0.5% |
| 393 | 448 | 0.4% |
| Other values (933) | 94489 |
| Value | Count | Frequency (%) |
| 1 | 272 | |
| 2 | 62 | 0.1% |
| 3 | 54 | 0.1% |
| 4 | 24 | < 0.1% |
| 5 | 175 | |
| 6 | 211 | |
| 7 | 403 | |
| 8 | 59 | 0.1% |
| 9 | 22 | < 0.1% |
| 10 | 184 |
| Value | Count | Frequency (%) |
| 943 | 168 | |
| 942 | 79 | |
| 941 | 22 | < 0.1% |
| 940 | 107 | |
| 939 | 49 | < 0.1% |
| 938 | 108 | |
| 937 | 40 | < 0.1% |
| 936 | 142 | |
| 935 | 39 | < 0.1% |
| 934 | 174 |
item_id
Real number (ℝ)
| Distinct | 1682 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 425.53013 |
| Minimum | 1 |
|---|---|
| Maximum | 1682 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.4 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 30 |
| Q1 | 175 |
| median | 322 |
| Q3 | 631 |
| 95-th percentile | 1074 |
| Maximum | 1682 |
| Range | 1681 |
| Interquartile range (IQR) | 456 |
Descriptive statistics
| Standard deviation | 330.79836 |
|---|---|
| Coefficient of variation (CV) | 0.7773794 |
| Kurtosis | 0.42253411 |
| Mean | 425.53013 |
| Median Absolute Deviation (MAD) | 196 |
| Skewness | 0.9863565 |
| Sum | 42553013 |
| Variance | 109427.55 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 50 | 583 | 0.6% |
| 258 | 509 | 0.5% |
| 100 | 508 | 0.5% |
| 181 | 507 | 0.5% |
| 294 | 485 | 0.5% |
| 286 | 481 | 0.5% |
| 288 | 478 | 0.5% |
| 1 | 452 | 0.5% |
| 300 | 431 | 0.4% |
| 121 | 429 | 0.4% |
| Other values (1672) | 95137 |
| Value | Count | Frequency (%) |
| 1 | 452 | |
| 2 | 131 | 0.1% |
| 3 | 90 | 0.1% |
| 4 | 209 | |
| 5 | 86 | 0.1% |
| 6 | 26 | < 0.1% |
| 7 | 392 | |
| 8 | 219 | |
| 9 | 299 | |
| 10 | 89 | 0.1% |
| Value | Count | Frequency (%) |
| 1682 | 1 | |
| 1681 | 1 | |
| 1680 | 1 | |
| 1679 | 1 | |
| 1678 | 1 | |
| 1677 | 1 | |
| 1676 | 1 | |
| 1675 | 1 | |
| 1674 | 1 | |
| 1673 | 1 |
rating
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.7 MiB |
| 4.0 | |
|---|---|
| 3.0 | |
| 5.0 | |
| 2.0 | |
| 1.0 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 300000 |
|---|---|
| Distinct characters | 7 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 3.0 |
|---|---|
| 2nd row | 3.0 |
| 3rd row | 1.0 |
| 4th row | 2.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 4.0 | 34174 | |
| 3.0 | 27145 | |
| 5.0 | 21201 | |
| 2.0 | 11370 | 11.4% |
| 1.0 | 6110 | 6.1% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 4.0 | 34174 | |
| 3.0 | 27145 | |
| 5.0 | 21201 | |
| 2.0 | 11370 | 11.4% |
| 1.0 | 6110 | 6.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 100000 | |
| 0 | 100000 | |
| 4 | 34174 | 11.4% |
| 3 | 27145 | 9.0% |
| 5 | 21201 | 7.1% |
| 2 | 11370 | 3.8% |
| 1 | 6110 | 2.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 200000 | |
| Other Punctuation | 100000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 100000 | |
| 4 | 34174 | 17.1% |
| 3 | 27145 | 13.6% |
| 5 | 21201 | 10.6% |
| 2 | 11370 | 5.7% |
| 1 | 6110 | 3.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 100000 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 300000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 100000 | |
| 0 | 100000 | |
| 4 | 34174 | 11.4% |
| 3 | 27145 | 9.0% |
| 5 | 21201 | 7.1% |
| 2 | 11370 | 3.8% |
| 1 | 6110 | 2.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 300000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 100000 | |
| 0 | 100000 | |
| 4 | 34174 | 11.4% |
| 3 | 27145 | 9.0% |
| 5 | 21201 | 7.1% |
| 2 | 11370 | 3.8% |
| 1 | 6110 | 2.0% |
time
Real number (ℝ)
| Distinct | 49282 |
|---|---|
| Distinct (%) | 49.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.8352885 × 108 |
| Minimum | 8.7472471 × 108 |
|---|---|
| Maximum | 8.9328664 × 108 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 781.4 KiB |
Quantile statistics
| Minimum | 8.7472471 × 108 |
|---|---|
| 5-th percentile | 8.7532031 × 108 |
| Q1 | 8.7944871 × 108 |
| median | 8.8282694 × 108 |
| Q3 | 8.8825998 × 108 |
| 95-th percentile | 8.9171789 × 108 |
| Maximum | 8.9328664 × 108 |
| Range | 18561928 |
| Interquartile range (IQR) | 8811274.5 |
Descriptive statistics
| Standard deviation | 5343856.2 |
|---|---|
| Coefficient of variation (CV) | 0.0060483098 |
| Kurtosis | -1.1687487 |
| Mean | 8.8352885 × 108 |
| Median Absolute Deviation (MAD) | 3886481 |
| Skewness | 0.1738863 |
| Sum | 8.8352885 × 1013 |
| Variance | 2.8556799 × 1013 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 891033606 | 12 | < 0.1% |
| 879440109 | 10 | < 0.1% |
| 880822060 | 10 | < 0.1% |
| 879874388 | 10 | < 0.1% |
| 885546577 | 10 | < 0.1% |
| 891293606 | 10 | < 0.1% |
| 875428765 | 10 | < 0.1% |
| 891500028 | 10 | < 0.1% |
| 879966498 | 10 | < 0.1% |
| 888637768 | 10 | < 0.1% |
| Other values (49272) | 99898 |
| Value | Count | Frequency (%) |
| 874724710 | 1 | |
| 874724727 | 1 | |
| 874724754 | 1 | |
| 874724781 | 1 | |
| 874724843 | 1 | |
| 874724882 | 2 | |
| 874724905 | 1 | |
| 874724937 | 1 | |
| 874724988 | 1 | |
| 874725081 | 1 |
| Value | Count | Frequency (%) |
| 893286638 | 7 | |
| 893286637 | 3 | |
| 893286603 | 1 | < 0.1% |
| 893286584 | 1 | < 0.1% |
| 893286550 | 3 | |
| 893286511 | 2 | < 0.1% |
| 893286502 | 1 | < 0.1% |
| 893286501 | 3 | |
| 893286491 | 1 | < 0.1% |
| 893286373 | 1 | < 0.1% |
| item_id | rating | time | user_id | |
|---|---|---|---|---|
| item_id | 1.000 | 0.260 | 0.027 | -0.007 |
| rating | 0.260 | 1.000 | -0.012 | 0.001 |
| time | 0.027 | -0.012 | 1.000 | 0.044 |
| user_id | -0.007 | 0.001 | 0.044 | 1.000 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
| user_id | item_id | rating | time | |
|---|---|---|---|---|
| 0 | 196 | 242 | 3.0 | 881250949 |
| 1 | 186 | 302 | 3.0 | 891717742 |
| 2 | 22 | 377 | 1.0 | 878887116 |
| 3 | 244 | 51 | 2.0 | 880606923 |
| 4 | 166 | 346 | 1.0 | 886397596 |
| 5 | 298 | 474 | 4.0 | 884182806 |
| 6 | 115 | 265 | 2.0 | 881171488 |
| 7 | 253 | 465 | 5.0 | 891628467 |
| 8 | 305 | 451 | 3.0 | 886324817 |
| 9 | 6 | 86 | 3.0 | 883603013 |
| user_id | item_id | rating | time | |
|---|---|---|---|---|
| 99990 | 806 | 421 | 4.0 | 882388897 |
| 99991 | 676 | 538 | 4.0 | 892685437 |
| 99992 | 721 | 262 | 3.0 | 877137285 |
| 99993 | 913 | 209 | 2.0 | 881367150 |
| 99994 | 378 | 78 | 3.0 | 880056976 |
| 99995 | 880 | 476 | 3.0 | 880175444 |
| 99996 | 716 | 204 | 5.0 | 879795543 |
| 99997 | 276 | 1090 | 1.0 | 874795795 |
| 99998 | 13 | 225 | 2.0 | 882399156 |
| 99999 | 12 | 203 | 3.0 | 879959583 |